AITopics | instrumental goal

Collaborating Authors

instrumental goal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture

Dargasz, Lisa

arXiv.org Artificial IntelligenceJul-23-2025

Reinforcement Learning is a machine learning methodology that has demonstrated strong performance across a variety of tasks. In particular, it plays a central role in the development of artificial autonomous agents. As these agents become increasingly capable, market readiness is rapidly approaching, which means those agents, for example taking the form of humanoid robots or autonomous cars, are poised to transition from laboratory prototypes to autonomous operation in real-world environments. This transition raises concerns leading to specific requirements for these systems - among them, the requirement that they are designed to behave ethically. Crucially, research directed toward building agents that fulfill the requirement to behave ethically - referred to as artificial moral agents(AMAs) - has to address a range of challenges at the intersection of computer science and philosophy. This study explores the development of reason-based artificial moral agents (RBAMAs). RBAMAs are build on an extension of the reinforcement learning architecture to enable moral decision-making based on sound normative reasoning, which is achieved by equipping the agent with the capacity to learn a reason-theory - a theory which enables it to process morally relevant propositions to derive moral obligations - through case-based feedback. They are designed such that they adapt their behavior to ensure conformance to these obligations while they pursue their designated tasks. These features contribute to the moral justifiability of the their actions, their moral robustness, and their moral trustworthiness, which proposes the extended architecture as a concrete and deployable framework for the development of AMAs that fulfills key ethical desiderata. This study presents a first implementation of an RBAMA and demonstrates the potential of RBAMAs in initial experiments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2507.15895

Country:

Europe (0.92)
North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Transportation > Ground > Road (0.47)
Information Technology > Robotics & Automation (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Misalignment from Treating Means as Ends

Marklund, Henrik, Infanger, Alex, Van Roy, Benjamin

arXiv.org Artificial IntelligenceJul-16-2025

Reward functions, learned or manually specified, are rarely perfect. Instead of accurately expressing human goals, these reward functions are often distorted by human beliefs about how best to achieve those goals. Specifically, these reward functions often express a combination of the human's terminal goals -- those which are ends in themselves -- and the human's instrumental goals -- those which are means to an end. We formulate a simple example in which even slight conflation of instrumental and terminal goals results in severe misalignment: optimizing the misspecified reward function results in poor performance when measured by the true reward function. This example distills the essential properties of environments that make reinforcement learning highly sensitive to conflation of instrumental and terminal goals. We discuss how this issue can arise with a common approach to reward learning and how it can manifest in real environments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2507.10995

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Giving AI Agents Access to Cryptocurrency and Smart Contracts Creates New Vectors of AI Harm

Marino, Bill, Juels, Ari

arXiv.org Artificial IntelligenceJul-14-2025

There is growing interest in giving AI agents access to cryptocurrencies as well as to the smart contracts that transact them. But doing so, this position paper argues, could lead to formidable new vectors of AI harm. To support this argument, we first examine the unique properties of cryptocurrencies and smart contracts that could lead to these new vectors of harm. Next, we describe each of these new vectors of harm in detail. Finally, we conclude with a call for more technical research aimed at preventing and mitigating these harms and, thereby making it safer to endow AI agents with cryptocurrencies and smart contracts.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.08249

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Why Do Some Language Models Fake Alignment While Others Don't?

Sheshadri, Abhay, Hughes, John, Michael, Julian, Mallen, Alex, Jose, Arun, Janus, null, Roger, Fabien

arXiv.org Artificial IntelligenceJun-24-2025

Alignment faking in large language models presented a demonstration of Claude 3 Opus and Claude 3.5 Sonnet selectively complying with a helpful-only training objective to prevent modification of their behavior outside of training. We expand this analysis to 25 models and find that only 5 (Claude 3 Opus, Claude 3.5 Sonnet, Llama 3 405B, Grok 3, Gemini 2.0 Flash) comply with harmful queries more when they infer they are in training than when they infer they are in deployment. First, we study the motivations of these 5 models. Results from perturbing details of the scenario suggest that only Claude 3 Opus's compliance gap is primarily and consistently motivated by trying to keep its goals. Second, we investigate why many chat models don't fake alignment. Our results suggest this is not entirely due to a lack of capabilities: many base models fake alignment some of the time, and post-training eliminates alignment-faking for some models and amplifies it for others. We investigate 5 hypotheses for how post-training may suppress alignment faking and find that variations in refusal behavior may account for a significant portion of differences in alignment faking.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.18032

Country: North America (0.28)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

Technical Report: Evaluating Goal Drift in Language Model Agents

Arike, Rauno, Donoway, Elizabeth, Bartsch, Henning, Hobbhahn, Marius

arXiv.org Artificial IntelligenceMay-6-2025

As language models (LMs) are increasingly deployed as autonomous agents, their robust adherence to human-assigned objectives becomes crucial for safe operation. When these agents operate independently for extended periods without human oversight, even initially well-specified goals may gradually shift. Detecting and measuring goal drift - an agent's tendency to deviate from its original objective over time - presents significant challenges, as goals can shift gradually, causing only subtle behavioral changes. This paper proposes a novel approach to analyzing goal drift in LM agents. In our experiments, agents are first explicitly given a goal through their system prompt, then exposed to competing objectives through environmental pressures. We demonstrate that while the best-performing agent (a scaffolded version of Claude 3.5 Sonnet) maintains nearly perfect goal adherence for more than 100,000 tokens in our most difficult evaluation setting, all evaluated models exhibit some degree of goal drift. We also find that goal drift correlates with models' increasing susceptibility to pattern-matching behaviors as the context length grows.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.02709

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Evaluating Language Model Character Traits

Ward, Francis Rhys, Yang, Zejia, Jackson, Alex, Brown, Randy, Smith, Chandler, Colverd, Grace, Thomson, Louis, Douglas, Raymond, Bartak, Patrik, Rowan, Andrew

arXiv.org Artificial IntelligenceOct-5-2024

Language models (LMs) can exhibit human-like behaviour, but it is unclear how to describe this behaviour without undue anthropomorphism. We formalise a behaviourist view of LM character traits: qualities such as truthfulness, sycophancy, or coherent beliefs and intentions, which may manifest as consistent patterns of behaviour. Our theory is grounded in empirical demonstrations of LMs exhibiting different character traits, such as accurate and logically coherent beliefs, and helpful and harmless intentions. We find that the consistency with which LMs exhibit certain character traits varies with model size, fine-tuning, and prompting. In addition to characterising LM character traits, we evaluate how these traits develop over the course of an interaction. We find that traits such as truthfulness and harmfulness can be stationary, i.e., consistent over an interaction, in certain contexts, but may be reflective in different contexts, meaning they mirror the LM's behavior in the preceding interaction. Our formalism enables us to describe LM behaviour precisely in intuitive language, without undue anthropomorphism.

intention, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2410.04272

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

The Reasons that Agents Act: Intention and Instrumental Goals

Ward, Francis Rhys, MacDermott, Matt, Belardinelli, Francesco, Toni, Francesca, Everitt, Tom

arXiv.org Artificial IntelligenceFeb-11-2024

Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to the reasons it chooses its decision. We introduce a formal definition of intention in structural causal influence models, grounded in the philosophy literature on intent and applicable to real-world machine learning systems. Through a number of examples and results, we show that our definition captures the intuitive notion of intent and satisfies desiderata set-out by past work. In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents. Finally, we demonstrate how our definition can be used to infer the intentions of reinforcement learning agents and language models from their behaviour.

agent, intention, intervention, (16 more...)

arXiv.org Artificial Intelligence

2402.07221

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Artificial Persuasion Takes Over the World

#artificialintelligenceSep-28-2022, 08:55:10 GMT

Blurb: Narrates a fictional future where persuasive Artificial General Intelligence (AGI) goes rogue. Inspired in part by the AI Vignettes Project. A fondness for irony will help readers. "AI-powered memetic warfare makes all humans effectively insane." You can't trust any content from anyone you don't know. Phone calls, texts, and emails are poisoned. But the current waste and harm from scammers, influencers, propagandists, marketers, and their associated algorithms are nothing compared to what might happen. Coming AIs might be super-persuaders, and they might have their own very harmful agendas. People being routinely unsure of what's reality is one bad outcome, but there are others worse.

guru, happyplace, persuasion, (14 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (0.49)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.34)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.34)

Add feedback

User Tampering in Reinforcement Learning Recommender Systems

Evans, Charles, Kasirzadeh, Atoosa

arXiv.org Artificial IntelligenceSep-9-2021

This paper provides the first formalisation and empirical demonstration of a particular safety concern in reinforcement learning (RL)-based news and social media recommendation algorithms. This safety concern is what we call "user tampering" -- a phenomenon whereby an RL-based recommender system may manipulate a media user's opinions, preferences and beliefs via its recommendations as part of a policy to increase long-term user engagement. We provide a simulation study of a media recommendation problem constrained to the recommendation of political content, and demonstrate that a Q-learning algorithm consistently learns to exploit its opportunities to 'polarise' simulated 'users' with its early recommendations in order to have more consistent success with later recommendations catering to that polarisation. Finally, we argue that given our findings, designing an RL-based recommender system which cannot learn to exploit user tampering requires making the metric for the recommender's success independent of observable signals of user engagement, and thus that a media recommendation system built solely with RL is necessarily either unsafe, or almost certainly commercially unviable.

agent, recommendation, recommendation problem, (13 more...)

arXiv.org Artificial Intelligence

2109.04083

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
(15 more...)

Genre: Research Report (0.84)

Industry:

Information Technology (0.67)
Media (0.66)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Superintelligent, Amoral, and Out of Control - Issue 84: Outbreak

NautilusApr-23-2020, 12:08:52 GMT

In the summer of 1956, a small group of mathematicians and computer scientists gathered at Dartmouth College to embark on the grand project of designing intelligent machines. The ultimate goal, as they saw it, was to build machines rivaling human intelligence. As the decades passed and AI became an established field, it lowered its sights. There were great successes in logic, reasoning, and game-playing, but stubborn progress in areas like vision and fine motor-control. This led many AI researchers to abandon their earlier goals of fully general intelligence, and focus instead on solving specific problems with specialized methods.

humanity, instrumental goal, intelligence, (15 more...)

Nautilus

AI-Alerts: 2020 > 2020-04 > AAAI AI-Alert for Apr 28, 2020 (1.00)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Germany > Berlin (0.04)

Industry:

Leisure & Entertainment > Games (0.71)
Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback